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Abstract — A framework is proposed that allows for a joint 
description and optimization of both binary polar coding and 
2 m -ary digital pulse-amplitude modulation (PAM) schemes such 
as multilevel coding (MLC) and bit-interleaved coded modula- 
tion (BICM). The conceptual equivalence of polar coding and 
multilevel coding is pointed out in detail. Based on a novel 
characterization of the channel polarization phenomenon, rules 
for the optimal choice of the labeling in coded modulation 
schemes employing polar codes are developed. Simulation results 
regarding the error performance of the proposed schemes on the 
AWGN channel are included. 



I. Introduction 

Polar codes [ 1 1 are known as a low-complexity binary coding 
scheme that provably approaches the capacity of arbitrary sym- 
metric binary-input discrete memoryless channels (B-DMCs). 
The generalization to Af-ary channels (M > 2) has been 
the subject of various works, cf., e.g. J2|, Gl, El- However, 
the topic of polar-coded modulation, i.e., the combination 
of M = 2 m -ary digital modulation, especially digital PAM 
(i.e., ASK, PSK, QAM), and binary polar codes for increased 
spectral efficiency, has hardly been addressed so far. In |]5] , a 
transmission scheme for polar codes with bit-interleaved coded 
modulation (BICM) (6|, Q has been proposed, focussing on 
the interleaver design. 

In this paper, we discuss both the multilevel coding (MLC) 
construction [8|, [9| and BICM. We restrict our considerations 
to memoryless channels like the AWGN channel (no fading). 
In case of BICM, we follow an alternative approach that differs 
from [5|. 

It has been observed (cf., e.g., [10]) that the MLC approach 
is closely related to that of polar coding on a conceptual 
level. Based on these similarities, we propose a framework that 
allows us to completely describe both polar coding and 2 m -ary 
PAM modulation in a unified context. To this end, we intro- 
duce so-called channel partitions. These transformations split 
an arbitrary memoryless 2 m -ary channel (e.g., the equivalent- 
baseband PAM channel in case of PAM modulation) into m 
binary-input memoryless channels (so-called bit channels). 

We distinguish two classes of such binary partitions, se- 
quential and parallel binary partitions. For the latter, the 
resulting bit channels are independent. It is thus applicable, 
e.g., to describe BICM. For sequential binary partitions, the 
bit channels depend on each other in a well-defined order - 
this class can be used for representing MLC. We show that 
both binary polar coding as well as polar-coded modulation 
may be described by the concatenation of binary partitions. 
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Considering the trade-off between power efficiency and 
spectral efficiency, this unified description makes it possible 
to design optimized constellation-dependent coding schemes 
both for MLC and BICM. 

Additionally, we provide an efficient method for a numerical 
evaluation of the performance of polar-coded modulation and 
present extensive numerical results for various settings. Using 
this method, we present a comprehensive comparison of polar- 
coded modulation based on MLC as well as on BICM, and we 
show results of a comparison to LDPC-coded modulation (for 
the latter only the common BICM approach is considered). 

The paper is organized as follows: In Sec. UU the framework 
for a joint description of polar coding and 2 m -ary PAM 
modulation is developed. This framework is then used for 
describing the polar coding construction in Sec. [Hi] leading 
to a novel interpretation of the polarization phenomenon. 
The optimal combination of binary polar coding and 2 m -ary 
modulation is discussed in Sec. |IV] for the multilevel coding 
approach and in Sec. [V] for bit-interleaved coded modulation 
(BICM), followed by simulation results for the AWGN channel 
in Sec. EH 

II. Channel Transforms 



A. Sequential Binary Partitions 

Let W : X — > y be a discrete, memoryless channel (DMC) 
with input symbols x G X (alphabet size \X\ = 2 k ), output 
symbols y G y from an arbitrary alphabet y, and mutual 
information I(X;Y).Q We define an order- k sequential binary 
partition (fc-SBP) (p of W to be a channel transform 



W^{B(°),...,B( fc - 1 )} 



(1) 



that maps W to an ordered set of fc binary-input DMCs (B- 
DMCs) which we refer to as bit channels. For any given W, 
such a fc-SBP is characterized by a binary labeling rule C v 
that maps binary fc-tuples bijectively to the 2 fe input symbols 

x G X: 



[bo,h 



A-i] e {0, i}* ; ^ x e x . 



(2) 



The number of possible labelings equals (2 !). 

Each bit channel (0 < i < k) of a fc-SBP is supposed 
to have knowledge of the output of W as well as of the 
values transmitted over the bit channels of smaller indices 



B 



(0) 



Thus, we have 

{o, 1} -> y * {o, l}* 



B« 



(3) 



'A short remark on the notation: channels are denoted by sans-serif fonts, 
capital roman letters stand for random variables while boldfaced symbols 
denote vectors or matrices. 
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The mutual information between channel input and output of 

Dip 

by 



(i) 

B^, assuming equiprobable input symbols is therefore given 



I(B«) :=I{B i ;Y\B ,,..,B i _ 1 ) 



(4) 



which we refer to as the (symmetric) bit channel capacity of 
. (If W is a symmetric channel, this value in fact equals the 
channel capacity.) The mutual information of W is preserved 
under the transform ip, i.e., 



k-l 



£l(B«)=I(X;Y) 



(5) 



i=0 



which directly follows from the chain rule of mutual informa- 
tion HU, US- 
Considering polar-coded modulation, we show that the code 
construction can be described by SBPs. We are particularly in- 
terested in two properties of SBPs, namely the mean value and 
the variance of the bit channel capacities, defined respectively 
as 



k— 1 



i=0 
k-l 



(6) 



(7) 



i=0 



Clearly, from ((5} the mean value Af ip (W) in fact depends only 
on the channel W, rather than on the particular transform 
if. It represents the average (symmetric) capacity of W per 
transmitted binary symbol. Obviously, 



< MJW) < 1 



(8) 



holds for any DMC W and any SBP p. The variance of an 
SBP ip is upper-bounded by 



VUW) < MJW)(1 - MJ\N)) 



(9) 



with equality only iff all I(Bp) are either or 1. This follows 
from 

k-l 

MW) = -]r/(B«) 2 -M v (W) 2 (10) 

i=0 
k-l 

< fcE J ( B ?)- M ^(w) 2 

i=0 

= M V (W)(1 - M V (W)) . 

and < /(B^) < 1 for all < i < fc. Note that this upper 
bound does not depend on the particular labeling L v but only 
on the channel W. 

An important subset of fc-SBPs is formed by those trans- 
forms whose labeling rules are described by binary bijective 
linear mappings. Let W = (Bo X ... x Bfc_i) be a vector 
channel of fc independent B-DMCs Bo, . . . , Bfc_i. Then, we 
call the fc-SBP 

<p: (B x...xBn)^{Bf,..,B^} (11) 

a linear k-SBP if its labeling rule is given by 

C v : b G 6 • A v G F§ . (12) 




Fig. 1. Concatenation of two 2-SBPs tp : W -> {B^ 0) , B^ 1 ' } and ip : B 2 

{b<°\b« } . 



with b := [bo,b\,. . . , bk-i] and A v being an invertible binary 
(k,k) matrix. Clearly, the number of possible linear fc-SBPs 
equals the number of non-singular binary (fc, fc) matrices and 
is significantly smaller than that of general fc-SBPs. 

B. Product Concatenation of SBPs 

Under certain conditions discussed below, it is possible to 
concatenate two (or more) SBPs in a product form. Let 

p: W->{BW,...,B<* 1 - 1 )} (13) 

be an arbitrary fci-SBP and 

ip: (B x...xB fc2 _ 1 )^{Bj )) ,...,B(f 2 - 1) } (14) 

a fc2-SBP that takes a vector channel of fc2 independent B- 
DMCs Bo, ... , Bfc 2 _i as an input. Each of the vector channels 
(B^) fc2 - obtained by taking fc 2 independent instances of B^ 1 
- can be partitioned by ip. Thus, ip and ip may be concatenated 
by considering the vector channel W fc2 , leading to a product 
SBP of order fcifc 2 : 



,^:W^{B^...,B;;n. (15) 



j(feife 2 — i) i 



Here, the bit channels of <p <g> ip are given by 
with symmetric capacities 

T(Di k 2i+f)\ 



I(Bk 2 i+j>Yo, • ■ ■ i Yk 2 -i\B , . . . , Bk 2 i+j-i) 

(17) 



such that 



, k 2 -l 

fc 2 ^ 

3=0 



(18) 



for all < i < fci and < j < k 2 . We remark that the product 
transform </? <g) ip is completely determined in a unique way by 
the individual SBPs tp and ip since their bit channels imply a 
fixed order. Fig. Q] shows a simple example of such a product 
concatenation of two 2-SBPs that results in a 4-SBP. 

The product concatenation of SBPs does not influence the 
mean value of the bit channel capacities, since 



M AA/fe2 



(W* 



1 



fei — 1 k 2 

E 

i=0 j=0 



klk2 ^ e ^(B^r ) («o 



fci 



I(X;Y) = M V (W) 



holds due to the chain rule of mutual information. However, 
the variance of the bit channel capacities increases. It is given 
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by the sum of the variance of the first transform and the 
averaged variance of the second transform around the bit 
channel capacities of the first one: 

VW ( W" 2 ) = Vp ( W) + - £ ( B f ) . (20) 

1 i=0 

This relation is proven in appendix A. 

If tp and tp are linear SBPs with labeling rules specified by 
A v and A^, respectively, then their product tp <g) ip is again a 
linear fc^-SBP with labeling rule 

C v ^ : 6 G F£ lfc2 ^ b • P fel , fe2 • (A,/, <g> A v ) . (21) 

Here, ® A y denotes the Kronecker product of A^ and 
A<p. Pki,k-2 is me (kik2, fcifca) permutation matrix that maps 
the (fei + j)-th component of the vector b to position i + k\j 
(for all < i < k\, < j < k 2 ). 



C. Parallel Binary Partitions 

Let W be a DMC with 2 fc -ary input as above. In analogy to 
the sequential approach of SBPs, we define an order- k parallel 
binary partition (fc-PBP) of W as a channel transform 

<p:\N^{B<®,...,B$- l) } (22) 

that maps W to a set of independent B-DMCs. The bit channels 
of the PBP tp are characterized by 

B«:{0,l}^y (23) 

with symmetric capacities 

I(B$) := I(Bi-,Y) (24) 

for < i < k. Note that for a given W a fc-SBP tp turns 
into a fc-PBP (p if the order of the bit channels and, by this, 
the information transfer from bit channels of lower indices are 
discarded. We refer to tp and tp, that share the same labeling 
rule C,p, as corresponding channel partitions. 

Mean value M^(W) and variance V^(W) are defined in 
analogy to © and (0; however, here the mean value depends 
on the specific PBP tp and is (in general) smaller than that of 
the corresponding SBP <p 

M^(W) < M y (W) (25) 

since obviously 

I(B$) =I(Bi;Y) (26) 
<I(B i ;Y\B ,...,B i _ 1 )=I(B$) 

holds for all pairs of bit channels. Unfortunately, a general 
comparative statement on the variances of tp and tp is not 
possible due to the labeling-dependent mean value Ma(W). 



D. Concatenation of PBPs 

In contrast to the case of SBPs, there is no unique way 
to concatenate parallel binary partitions since the output bit 
channels of a PBP tp are mutually independent, allowing 
for arbitrary permutations between the particular PBPs. 

However, we point out that the (unpermuted) concatenation 
of a fc-PBP Cp as in d22j with a fc-SBP ip (that accepts fc B- 
DMCs as an input), i.e., 

<P®i>:W^{Bf er ...,B%-V}, (27) 

that simply connects the (independent) output channels of tp 
to the input of ip, results in sort of a "degraded fc-SBP" with 
labeling rule £ v q^ in the sense that its bit channels imply a 
fixed order while their capacities do not sum up to I(X; Y). 
The bit channels of this transform are given by 

^ ■■ {o, i} -> y x {o, iy (28) 

(0 < i < fc) with symmetric capacities 

I(B^) : 1-r.r.V IK. IK- (29) 

where B^ t i (0 < i < fc) denote the labels at the output of tp. 
The sum of bit channel capacities equals the value from the 
PBP <p: 

1 fe_1 1 

r E / ( B ^0^) = M ^( w ) ^ M ^( w ) = i 1 ^ y ) i ( 30 > 

i=0 

thus, the transform (p © if> is in general not a SBP. 

In case that tp and ip are linear channel transforms repre- 
sented by A v and A^,, respectively, the concatenation tp ip 
is again a linear transform characterized by the labeling rule 

: b b- (A^ ■ A v ) , (31) 

i.e., the common matrix product of A$ and A v . 

III. Polar Codes 

Polar codes, as introduced by Ankan JT], have been shown 
to be a channel coding construction that provably achieves the 
symmetric capacity of arbitrary binary-input discrete memory- 
less channels (B-DMCs) under low-complexity encoding and 
successive cancellation (SC) decoding. For sake of simplicity, 
we focus on Ankan's original construction in this paper; the 
generalization to polar codes based on different kernels (as 
considered, e.g., in [13]) is straightforward. Furthermore, we 
restrict our considerations to the SC decoding algorithm as 
in Q; though, our results regarding the code construction are 
also valid for other (better performing) decoders that are based 
on the SC algorithm, as, e.g., list decoding lfl4l . 

A. Code Construction 

Let B : {0, 1} -> y be a B-DMC and /(B) its symmetric 
capacity, i.e., the mutual information of B assuming equiprob- 
able binary input symbols. Encoding takes place in the binary 
field Fg. The encoding operation for a polar code of length 
N may be described by multiplication of a binary length- A*" 
vector u - containing the information symbols as well as some 
symbols with fixed values (so-called frozen symbols) that do 
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Fig. 2. Polar coding construction for N = 2. 



not hold any information - with a generator matrix G n that 
is defined by the recursive relation 



Gm — BmF 



JV-f N , 



2N 



N 



2 = 



(32) 



where N is a power of two and ® again denotes the Kronecker 
product. B jv denotes the (N, N) bit-reversal permutation ma- 
trix [Q. The resulting codeword c = uGn is then transmitted 
in N time steps over the binary channel B. 

The code construction is based on a channel combining and 
channel splitting operation [ 1 1 that may be represented as a 
linear 2-SBP 

7t:B 2 ^{BW BW} (33) 

that partitions the vector channel B 2 , i.e., two independent and 
identical instances of B, into two bit channels 

B<?> :{0,l}^y 2 (34) 
B« :{0,l}->y 2 x{0,l} 

with symmetric capacities 

J(B(°)) = 7(C7o;Fo,y 1 ) 

7(BW)=/(f/ i; y ,ii|C/o). 

The labeling rule is given by 

C n : u= [u , u{\ £ F2 h> m ■ G2 G F 2 . 

as visualized in Fig. [2] Since the average capacity per binary 
symbol does not change under an SBP, we will denote the 
mean value of the bit channel capacities of tt by 1(B) instead 
of M„.(B) in the following. 

It follows easily from (T) by comparison of the permutation 
matrices that the construction of a polar code of length 
N = 2™ may be equivalently represented by the n-fold product 
concatenation of tt as defined in the preceding section. The 
resulting SBP tt" generates a partition of the vector channel 



(35) 



into N bit channels 

B$ :{0,1}^ y N x{0,lY 
(0 < i < N) with symmetric capacities 

J(B$) : = I(Ui; Y , . . . , Y N ^\U , . . . , U^) 
Here, the labeling rule is given by 



: ueFf ^u-G N G F 



N 



(37) 



(38) 



(39) 



(40) 



Therefore, the transmission of each source symbol Ui can 

(i) 

be described by its own bit channel B^.„ . The output of each 



channel B^l depends on the values of the symbols of lower 
indices uq, . . . Uf_i. Thus, the channels B^l imply a specific 
decoding order. 

For data transmission only the bit channels with highest 
capacity are used, referred to as information channels. The 
data transmitted over the remaining bit channels (so-called 
frozen channels) are fixed values known to the decoder. By 
this means, the code rate can be chosen in very small steps 
of 1/7Y without the need for changing the code construction 
- a property especially useful for polar-coded modulation (cf., 
Sec. 1MB). 

In order to select the optimal set of frozen channels, the 
values of the capacities J(B^l) are required. These can either 
be obtained by simulation or by density evolution |[T5"1 . 



B. Successive Decoding 

Upon receiving a vector y - being a noisy version of the 
codeword c resulting from transmission over the channel B 
- the information symbols Ui can be estimated successively 
for i = 0, . . . , N — 1. Here, information combining |[T6l 
of reliability values obtained from the channel output y is 
performed instead of F2 arithmetics as in the encoding process. 

The successive cancellation (SC) decoding algorithm 0] for 
polar codes generates estimates on the information symbols ui 
(transmitted over the channel B^l) one after another, making 
use of the already decoded symbols uq, . . . , Ui-x. We denote 
the probability that an erroneous decision is made at index i 
given the previous decisions have been correct, by p (B^l). 
Thus, the word error rate for SC decoding (WERgc) is given 
by 



WER S c = l-n( 1 -^( B -") 



ieA 



(41) 



(36) where A denotes the set of indices of the information channels. 



(i) 



C. Variance of the Bit Channel Capacities 

With increasing block length, the set of bit channels B^' 
shows a polarization effect in the sense that the capacity 
I(B^l) of almost each bit channel is either near or near 
1. The fraction of bit channels not being either completely 
noisy or completely noiseless tends to zero fT|. 

In the following, we show that this polarization effect may 
be represented by the sequence of variances of the respective 
polar codes' bit channel capacities for increasing block length. 
The variance of the bit channel capacities of a length- N polar 
code around their mean value 1(B) is given by 



N-l 



(42) 



i=0 



Using 1201 . we notice that the sequence of variances increases 
monotonously as the block length gets larger, i.e., 

V^ +1 (B 2N )>V^(B N ). (43) 

Furthermore, from (TTOi the sequence {V^n (B Ar )} rag N is upper- 
bounded by 

V^(B N )<I(B)(l-I(B)) (44) 
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Fig. 3. Bit channel variance for polar codes over various B-DMCs, block 
length N = 2 n , n = 1, 2, 3, 8, 12, 20. Blue solid: BEC, blue dashed: BSC, 
red dashed: binary-input AWGN channel (Gaussian approximation). Black: 
upper bound on the variance. 



for all n e N. According to ( [Tol l, this maximum variance 
can be only achieved iff all bit channel capacities I(B^l) 
are either or 1, which obviously corresponds to the state 
of perfect polarization. As shown by Ankan JT], the latter is 
asymptotically approached while the block length N goes to 
infinity; therefore, we have 



lim V„n(B N ) = 1(B) •(!- 7(B)) 



(45) 



Although we have not yet been able to establish an explicit 
relation between bit channel capacity variance and code error 
performance, one would intuitively expect that increasing the 
variance by a careful code design should correspond to a 
sharper polarization of the bit channels and therefore should 
lead to better performing polar codes in terms of word error 
rate or bit error rate. 

Fig. |3] depicts the variance of the bit channel capacities for 
polar codes of various block lengths constructed over several 
B-DMCs as a function of their capacity. Besides the binary 
erasure channel (BEC) and the binary symmetric channel 
(BSC) lfl2ll - that represent the extremes of information 
combining lfl6l and serve as an upper and lower bound, 
resepectively - values for the binary-input AWGN channel 
are given that have been obtained by density evolution with a 
Gaussian approximation, as is explained in Sec. |VI] Obviously, 
the inaccuracy introduced by this approximation increases with 
decreasing channel capacity 1(B). The converging behaviour 
for increasing block length N towards the maximum achiev- 
able variance (black line) can clearly be observed. 

IV. Multilevel Polar Coding 

We now consider the conventional discrete-time equivalent 
system model of M-ary digital pulse-amplitude modulation 
(PAM) 03 - M = 2™ being a power of 2 - with signal con- 
stellations of real-valued signal points (ASK) or of complex- 
valued signal points (PSK, QAM etc.) over a memoryless 
channel W, e.g., the AWGN channel. 



From an information-theoretic point of view, an optimal 
combination of binary coding and M-ary modulation follows 
the multilevel coding (MLC) principle JS), (9). 

A. Multilevel Coding 

In the MLC approach, the M-ary channel W is partitioned 
into m bit channels (also called bit levels) by means of an 
m-SBP 

A:W->{Bf,...,Bf- 1 >}. (46) 



The mapping from binary labels to amplitude coefficients is 
specified by the labeling rule C\. 

Channel coding is implemented in the MLC setup by using 
binary component codes J9] for each of the bit levels 
individually with correspondingly chosen code rates i?,. The 
overall rate (bit per transmission symbol) is given as the 
sum R = Y^I^q 1 Ri- The receiver then performs multi-stage 
decoding (MSD), i.e., it computes reliability information for 
decoding of the first bit level which are passed to the decoder 
of the first component code. The decoding results are used for 
demapping and decoding of the next bit level, and so on. 

According to the capacity rule J9), the code rate for the 
i-th level should match the bit level capacity Z(B^). Since 
these capacities vary significantly for the different levels, for 
MLC channel codes are preferred, that allow for a very flexible 
choice of the code rate. 

The mutual information between the channel input and 
channel output of W assuming equiprobable source symbols is 
also referred to as the coded modulation (9], or constellation- 
constraint, capacity C cm (W). It is related to the average 
capacity per binary symbol (O of W by 



C cm (W) :=I(X;Y)=J2m^) 

i=0 



m ■ M A (W) . (47) 



Since A is an SBP, the coded modulation capacity does not 
depend on the specific labeling rule C\. 

A potential drawback of the MLC approach for practical 
use lies in the necessity for using several (comparatively short) 
component codes with varying code rates for the particular bit 
levels. 

B. Multilevel Polar Coding 

We have shown that both, the multilevel coding construction 
and the polar coding transform, may be described by SBPs. 
This allows us to represent the combination of MLC with polar 
codes in a simple form as a product concatenation of SBPs. 
It also provides insight how the labeling C\ should be chosen 
in an optimal way. 

A multilevel polar code of length mN , i.e., a multilevel 
code with length- N component polar codes over an M-ary 
constellation, is obtained by the order-mA'" concatenation of 
the rn-SBP A of MLC and the A^-SBP tt" of the polar code: 

A®^:W"^{B^,... 1 B<S£- 1 >} (48) 

as defined in ( fTBI l. The encoding process for this multilevel 
polar code is described by the generator matrix 



(Gat® I m ) 



(49) 
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Fig. 4. Bit channel capacities of a multilevel polar code (mN = 512) using 
16-ASK modulation with labeling according to the set-partitioning rule 1 18] 
over the AWGN channel at 10 log 10 (i? s /A r o) = 7dB. The overall rate is 
R = 1.5. Frozen channels are demarked by tilled circles. 



with P m ,N as in dZDi followed by labeling and mapping to 
the N transmit symbols as defined by A. Here, I m denotes 
the (m, m) identity matrix. 

The word error rate for successive decoding of a multilevel 
polar code (WERsc) is given by 



WER SG = l-n( 1 -Pe(B^„) 



(50) 



in analogy to fiTT i. Here, A denotes the set of indices of the 
channels used for information transmission while Pei^xL^n) 
stands for the probability of a first wrong decision at index i 
in the successive decoding process, like before. 

We remark that the selection of frozen channels - and thus, 
the rate allocation - is done in exactly the same way as 
for a usual binary polar code by determining the symmetric 
capacities I(B\L v n) (0 < i < mN) and choosing the most 
reliable bit channels for data transmission. This selection 
process is exemplarily visualized in Fig. |4] for an artificial 
choice of parameters. Therefore, the explicit application of 
a rate allocation rule to the particular component codes - 
like considered in the original MLC approach [9] - is not 
needed in case of multilevel polar codes. However, it has been 
shown |fl9l that the rate allocations obtained by this method 
basically equal those obtained from the capacity rule. 

According to ( |20| i, the variance of the bit channels of a 
multilevel polar code with length- N component codes is given 
by 

1 m — 1 

- x ^ '!«). (51) 



*W(W W ) = Vx(W) + - V V^(B^) 
m * — ' 

i=0 



Thus, the SBP A - that represents the modulation step - may 
be seen as the first polarization step of a multilevel polar code. 
From this representation, it is clear that A should be chosen 
such that it maximizes the term ( BTT l. 

In this approach, both binary coding and 2 m -ary modulation 
are represented in a unified form as a sequential binary channel 



partition of the vector channel W". Both should be designed 
according to the polarization principle, i.e. the maximization of 
the variance of the bit channel capacities ( BTT l under successive 
cancellation - or, equivalently, multi-stage - decoding by 
careful choice of the labeling rule. 



C. Multilevel Polar Codes are Capacity-Achieving 

Using MLC with multi-stage decoding, an M-ary channel 
W is splitted into m bit levels (0 < i < m) that are B- 
DMCs as long as W is a DMC. Their symmmetric capacities 
sum up to C cm (W), cf. (T47T) . According to fl] Th. 1], the polar 
component codes approach each of these bit level capacities 
while their block length increases. 

We thus conclude, that multilevel polar codes together with 
MSD and SC decoding achieve the coded modulation capacity 
Ccm(VV) for arbitrary M-ary signal constellations in case of a 
memoryless transmission channel. All results on the speed of 
convergence considering transmission over a single B-DMC 
hold as well in the case of MLC. 

Obviously, this (asymptotic) result does not depend on the 
labeling rule £\ applied in MLC. However, for finite-length 
codes the labeling has significant impact on the performance 
of polar-coded MLC. 

D. Influence of the Labeling Rule 

From (IBTT l. it is clear that a labeling rule C\ should be 
applied that leads to a large variance of the bit level capac- 
ities. Here, we focus on two labeling approaches that follow 
contrary aims: 

• In the set-partitioning (SP) labeling approach (corre- 
sponding to Asp) by Ungerboeck lfT8l . for each of the 
bit levels - starting from the lowest one - the sets of 
signal points corresponding to the following bit level are 
chosen such that the minimum Euclidean distance within 
the subsets is maximized. Therefore, the increment of 
mutual information from each level to the next one is 
designed to be large - if there is knowledge about the 
previous levels - which should lead to widely separated 
bit level capacities corresponding to large values of the 
variance Va sp (W). 

• As opposed to that, the Gray labeling approach Aq 
aims to generate bit levels that are as independent as 
possible l20ll , Here, we expect bit levels with capacities 
that do not differ significantly, leading to a small variance 
V\ G (W) of the bit level capacities. 

Fig. [5] depicts the variance of the bit levels for ASK modulation 
using both SP and (binary-reflected) Gray labeling. Here, we 
focus on the variance curves for multi-stage decoding (solid 
lines); the variances under parallel decoding will be considered 
in the following section. It can be observed that - except for 
small capacities A1\(W) - the SP labeling approach leads 
to significantly larger bit level variances compared to Gray 
labeling, as expected. Therefore, for multilevel polar codes, 
SP labeling should be preferably applied. 

Furthermore, when compared to the corresponding variance 
curves of polar codes over a single B-DMC for = 2, 4, 8 



Fig. 5. Bit level variance for 2 m -ary ASK signalling over the AWGN 
channel (m = 2, 4, 8). Solid lines: multi-stage decoding, dashed lines: parallel 
decoding. Red: SP labeling, black: Gray labeling 

as shown in Fig. [3] especially in case of SP labeling the 
achieved bit level variance is significantly higher, underlining 
the importance of the careful choice of the labeling C\ in this 
first step of polarization for multilevel polar codes. 

V. Bit-Interleaved Polar-Coded Modulation 

In contrast to the successive approach used for MLC with 
MSD, in a BICM setup all bit levels are treated equally at 
both sides, the transmitter and the receiver 0,10. 

A. Bit-Interleaved Coded Modulation 

We assume here the same underlying 2 m -ary channel W as 
before. The source bits in BICM are encoded using a single 
binary channel code with rate R c , leading to an overall rate 
of R = m ■ R c . The code symbols are (possibly) interleaved 
according to some pseudo-random order and partitioned into 
m-tuples of code symbols, which are then mapped to ampli- 
tude coefficients x € X. 

The BICM receiver performs parallel decoding, i.e., it 
neglects the relations between the bit levels and computes 
reliability information independently for each bit level based 
on the received symbol. These bit metrics are deinterleaved 
and fed to the decoder. Thus, the channel transform used in 
the BICM setup may be represented by an m-PBP A 

A:W-)-{Bj )) 1 ...,B5 n - 1) }. (52) 

The BICM capacitj@ (or parallel-decoding capacity) of the 
channel W is given as the sum of the bit level capacities 
J(B^) neglecting the feedback of lower bit levels; therefore, 
from (l26l i it is generally smaller than the coded-modulation 
capacity: 

m— 1 

C A ,bicm(W) = Y, < CU(W) ■ ( 53 ) 

»=0 

This loss of the BICM capacity w.r.t. to the coded- 
modulation capacity depends on W, but also on the applied 

2 assuming equiprobable input symbols 



Fig. 6. Encoding graph for a BICM polar code of length N = 8 with 
generator matrix Gg = BgFg for a 16-ary constellation. The bit-reversal 
permutation Bg has been already applied to u. 

labeling rule C\ . It has been shown that - except for the case of 
very low capacities C>,,bicm(vV) - this loss is minimized when 
Gray labeling is used whereas SP labeling leads to a significant 
loss of mutual information ll20l . Ell . The labeling-dependent 
different behaviour under parallel decoding - when compared 
to MSD - is also evident from the bit level variances as in 
Fig. [5] While the curves for MSD and parallel decoding do not 
differ significantly in case of Gray labeling, for SP labeling 
with parallel decoding a serious degradation is observed. 
Therefore, we will only consider Gray labeling C\ G in the 
BICM setup. 

B. Bit-Interleaved Polar-Coded Modulation 

Since the labeling C\ G is fixed, there remain two ways for 
optimizing the combination of polar codes and BICM: either 
by designing an optimized interleaver or by changing the polar 
code itself. 

In 0, the interleaver design has been considered. Clearly, 
the bit channel variance for a length-mTV BICM polar code 
depends on how the bit channels B^' obtained from the N 
transmission symbols - with varying capacities - are allocated 
to the order-mTV polar coding transform. The authors showed 
that by means of a partial exhaustive search a performance 
improvement can be observed when compared to random 
interleaving 0. 

Here, we will follow the second approach: We assume that 
no interleaver is used at all. Since we focus on memoryless 
transmission channels such as the AWGN channel in this work, 
this is a reasonable assumption!! Now, the straight-forward 
approach of combining BICM over an 2 m -ary constellation 
with polar codes simply connects a polar code of length mN 
- described by a generator matrix G m N - to the m-PBP Aq- 
In order to use Ankan's standard construction, we assume m 
to be a power of two itself. Otherwise, we would have to use 
a polar code with a different kernel. Fig. |6] shows an example 

3 We are motivated by the fact that for BICM with convolutional codes over 
the AWGN channel, even a significant performance gain for the interleaver- 
free case w.r.t. random interleaving can be observed 11221 . 



s 



of a simple BICM polar code obtained in this way where the 
input symbols of a length-8 polar code are mapped onto two 
symbols of a 16-ary constellation. 

The overall channel transform for this unpermuted approach 
is given by Ag 7r loS2 ( m ) ® n n . From Sec. HH-D we know 
that the first part Ag 7r log2 ( m ) may be seen as a degraded 



m-SBP, represented by the labeling rule C 



AG©T log2(m) 



. Since 



£\ G is fixed, our optimization approach for polar-coded BICM 
consists in changing the first polarization steps of the polar 
code, i.e., we replace the m-SBP 7r log 2( m ) by an optimized 
m-SBP r that maximizes the bit channel variance of Ag O t. 

C. Transformation of Labelings 

It has been shown [23 1 that for one-dimensional constel- 
lations, natural labeling and binary reflected Gray labeling 
can be transformed into each other by a (bijective) linear 
transform. 

1 ) ASK/PSK Constellations: A natural labeling (counting in 
dual numbers) over an Af-ary ASK/PSK constellation - which 
is identical to an SP labeling in this case - can be represented 
as an (M,m) binary matrix M"sp, m (m = log 2 (A/)) con- 
taining the dual representations of the numbers 0, . . . , M — 1 
as rows. Here, the left-most column represents the least 
significant bit. Similarly, a (binary reflected) Gray labeling is 
given by a binary matrix TAf Q ra y of equal dimensions. Below, 
an example for m = 3 is given: 



M SP .3 = 





1 

1 

1 1 

1 

1 1 

1 1 

1 1 1 



M G ray,3 = 





1 
1 1 
1 

1 1 

111 

1 1 
1 



(54) 



As shown in 11231 . a set-partitioning labeling of an M- 
ASK/PSK constellation can be transformed into a binary 
reflected Gray labeling via an (m, m) binary matrix 



such that 



holds. 







1 1 



M 



Gray,m 



(55) 



(56) 



2) QAM Constellations: Similar to the case of ASK/PSK 
constellations, it is also possible to convert an SP labeling 
into a Gray labeling by a linear transform in case of square 
Af 2 -QAM constellations. Here, Afgp,2m and .Moray, 2m are 
related by 



M S p i2m • (G 2 T m ) = M Gray , 2r , 



(57) 



where G2 equals the generator matrix of a length- 2 polar code, 
cf., d32l . This relation is proven in Appendix B. 
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Fig. 7. Decoding trees for successive estimation of u = [uo, ui , 112, 113] 
from x = uTi given reliability values L(xo), . . . , L(x'j). The known values 
uo,ui,U2 in the graphs correspond to LLR values of ±00. 



3) Successive Decoding of T m : Since T rn is a non- 
singular, square binary (m, m) matrix, it induces a channel 
transform which is represented by the m-SBP 



T:(B x...xB I „_ 1 )^{Bf,...,B('"- 1 '} 



(58) 



that maps the vector channel of m independent B-DMCs 
(0 < i < m) to an ordered set of different B-DMCs. By 
construction of r, the concatenation Aq t is characterized 
by a labeling rule £a G 0t = £\ SP , i-e., an SP labeling. 

We now demonstrate that the transform induced by the 
matrix T m can be reversed in a successive way at the receiver 
side just like the polar coding transform 7r™ - that is induced by 
Gjv - under SC decoding. Let x = uT m be the Gray-labeled 
representation of u = [uq, . . . , it m _i] that is mapped to A/-ary 
ASK/PSK symbols and transmitted. As follows immediately 
from the structure of T m d55l >, x is given as 



[uo © Ul,Ui ffi U2, • • • ,W m _2 © U m _i,tl„ 



(59) 



Let us further assume that, at the receiver, reliability values 
L(xq), . . . , L(x m -{) for the components of x, e.g., LLR 
values, have been determined by using parallel decoding, like 
in plain BICM. The components u% of u (0 < i < m) 
can now be decoded successively from x, making use of the 
reliability information on x as well as of the already estimated 
components Uq, . . . ,Ui-\\ 

> Clearly, from ( |59l uo may be written as a sum of all 
components of x: 



U 



(60) 



i=0 



In a factor-graph notation, the decoding tree for esti- 
mating uo simply consists of a check node of order 
m, as visualized in Fig. [7^). Therefore, given reliability 
information on the components of x, this (Galois field) 
sum can be evaluated by using the well-known operations 
of information combining, cf., e.g., |[T6l . 
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Fig. 8. Encoding graph for an optimized BICM polar code of length N = 8 
with generator matrix P4.2 • (G2 ® T4) for a 16-ASK constellation. The 
permutation Pa,2 has already been applied to u. 



The next component U\ is represented by two indepen- 
dent equations, making use of the knowledge of uq\ 



m— 1 

mi = a;o © uo ■ 



(61) 



Here, "independent" means that each code symbol x.- L 
appears in at most one of the equations. The correspond- 
ing computation tree is shown in Fig. [T^), involving two 
check nodes and one variable node. 
The remaining components of u are now determined one 
after another in a similar way from the two independent 
equations 



Xj-i © u 



(62) 



j = 2, . . . , m - 1 . 



D. Code Modification 

Employing r in the construction of a length-miV BICM 
polar code, the overall channel transform is given by 

(A G 0r)8 n n : \N N -> {B<°> n _ n> .. . , B^"^ J ; 

*• ' L (AG0T)i8i7r" ' ' (AG0i")(8>ir" ' ' 

(63) 

thus, the encoding process for this modified BICM polar code 
is described by a generator matrix 



,N 



(G N ® T m ) 



(64) 



followed by Gray-labeled mapping to the transmit symbols. 
Fig. [8] depicts an example of a length-8 BICM polar code 
optimized for 16-ASK modulation that is described by the 
generator matrix P^2 ■ {G2 © T4). 

Interestingly, the transform (l63l - that is optimized for 
BICM polar codes - and the optimal multilevel code defined 
by the mTV-SBP using SP labeling 



A S p ® 7T T 



B 



(mN 



^} (65) 



share the same labeling rule and thus decribe the same code, 
i.e., identical binary source symbols are encoded to identical 
transmission symbols in both cases. However, the decoding 
strategies at the bit metrics calculation step differ for the two 
approaches: In case of BICM, parallel decoding is used in 
contrast to successive decoding in the MLC approach. 

VI. Simulation Results 

We now give some numerical results in terms of rate-vs.- 
power-efficiency plots in order to illustrate the error perfor- 
mance of polar-coded modulation with SC decoding over the 
AWGN channel. 

Besides common Monte-Carlo simulations, we also present 
results obtained by density evolution (DE) [ 15 1, [24|, a method 
that allows for approximate error performance analysis with 
neglegible numerical effort even for large code lengths. Here, 
for multilevel polar codes, we numerically determine the bit 
level capacities I(B^) (0 < i < m) of the respective PAM 
constellation, cf., e.g., [9]. Now, for each of the m binary 
component polar codes, a Gaussian channel with capacity 
I(B^) is assumed as a transmission channel. The mN bit 
channel capacities - and the corresponding error probabilities 
p e (B^ 7r „) - of the m component polar codes are then 
determined by performing density evolution (DE) with the 
well-known Gaussian approximation J25), i.e., we simply 
assume the output bit channels of each SBP in the chain 
A ® 7r (g> . . . ® 7r to be AWGN channels. Finally, from (l50l l. the 
maximum achievable code rate R under successive decoding 
given a target word error rate WER max is obtained. This 
procedure is carried out for each value of the signal-to-noise 
ratio Eh/No. Although the overall transmission channel is the 
AWGN channel, for the bit channels occurring in the multi- 
stage decoding process, this assumption certainly does not 
hold. Nevertheless, the inaccuracy induced by this Gaussian 
assumption is small for multilevel polar codes, as shown in 
Fig. EJ 

Fig. [TO] depicts the performance of multilevel polar codes 
with 16-ASK modulation under SC decoding for different 
labelings C\ and various block lengths. The large performance 
loss of Gray labeling w.r.t. SP labeling can clearly be observed. 

For DE in the case of BICM polar codes, the bit channel 
capacities from the first polarization steps ^(B^ Q7rlog2(m) ) and 

J(Bj^ Qt ) - as in d52l and d58T >. respectively - have been 
obtained by Monte-Carlo simulation, followed by Gaussian- 
approximated DE for the component codes, as in the MLC 
case. From Fig. [TTJ a significant performance gain for the 
optimized code construction from Sec. |V}D w.r.t. unmodified 
BICM polar codes can be observed. However, due to the 
suboptimality of the BICM approach, the performance of mul- 
tilevel polar codes is not achieved. Moreover, the inaccuracy 
introduced by the Gaussian assumption for DE increases for 
the BICM channels when compared to the MLC case, leading 
to an additional loss. 

Finally, Fig. [12] compares the performance of SP-labeled 
multilevel polar codes to the BICM -based coding scheme used 
in the DVB-T2 standard ||26l . It is observed that multilevel 
polar codes (under SC decoding) do not achieve the error 
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Fig. 9. M -ASK / AWGN, M = 4, 16: Rate vs. SNR of multilevel polar 
codes using SP labeling (blue) and Gray labeling (gray) obtained by DE 
(continuous lines) as well as simulated values. Overall block length mN = 
512. Bold blue line: coded-modulation capacity, dashed black: Shannon bound 
for real constellations 
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Fig. 10. 16-ASK / AWGN: Rate vs. SNR of multilevel polar codes using 
SP (blue) and Gray labeling (dashed gray). Overall block length (from right 
to left) mN = 2 k , k = 9, 11, 13, 15. Bold blue line: coded-modulation 
capacity, dashed black: Shannon bound for real constellations 

performance of the DVB-T2 system consisting of a concatena- 
tion of an LDPC code with a BCH code of equivalent overall 
block length. On the other hand, multilevel polar codes are 
decoded with a single-step, non-iterative decoding algorithm 
that requires less information combining operations and thus 
leads to a reduced computational complexity, compared to the 
concatenated coding approach in DVB-T2. 

VII. Conclusions 

In this paper, we have extended the binary polar coding 
approach to higher-order digital 2 m -ary modulation. We have 
shown that the combination of multilevel coding and polar 
coding results in a sequential binary channel partition (SBP) 
of a vector channel into B-DMCs that can be successively de- 
coded, just like for the case of binary polar codes. The optimal 
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Fig. 11. 16-ASK / AWGN: Rate vs. SNR of multilevel polar codes using SP 
labeling (blue) and BICM polar codes using the original construction (green) 
and the proposed modified construction (red). Solid lines correspond to results 
obtained by DE, markers to simulated values. Overall block length (from right 
to left) mN = 2 k , k = 9, 14. Bold blue line: coded-modulation capacity, 
dashed gray: BICM capacity using Gray labeling, dashed black: Shannon 
bound for real constellations 
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Fig. 12. M 2 -QAM / AWGN, M 2 = 4, 16, 256: Rate vs. SNR of multilevel 
polar codes over using SP labeling obtained by DE (blue solid 

lines) and reference values for DVB-T2 [26] (green markers). Overall block 
length: 65.536 (Polar Codes), 64.800 (LDPC+BCH). Bold blue: coded- 
modulation capacity, black: Shannon bound 



choice of the binary labeling of the 2 m signal constellation 
points has been discussed. Using BICM instead of MLC, we 
have demonstrated - for the case of ASK, PSK and square 
QAM constellations - that by a slight modification of the 
polar code generator matrix, multilevel polar codes and BICM 
polar codes can be transformed into each other. Although both 
approaches may be designed to describe the same 2 m -ary 
code, for BICM a degradation w.r.t. the multilevel approach is 
observed which is caused by the suboptimal parallel decoding 
step at the bit metrics calculation in BICM. 

Therefore, we conclude that for polar-coded modulation, the 
use of MLC should be preferred over BICM, if successive 
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decoding is considered. 

Appendix A 
Proof of Equation ( f20l > 

By definition, the variance of the bit channel capacities 
under the concatenation ip (g) ip is given by 

1 2 i=0 j=0 

Adding and subtracting the term X)i=Lo 1 ^(B^) 2 together 
with ( fl9] > leads to 

fc!-l 

2=0 j=0 

Finally, (O and yield 

fci-i 

W(w fe2 ) = ^(W) + F E V +Pf) ■ 

1 i=0 

Appendix B 
Proof of Equation d57l i 

We consider a square M 2 -QAM constellation with labels 
that are binary tuples of length 2m (with m — log 2 (M)) of 
the form 

a '■= ■ • • ) a l,m, 02,1, ■ • ■ , 02,m] 

where the first and last m bits represent the naturally labeled 
row and column indices, respectively. The application of the 
transform G2®I m - with I m being the (m, m) identity matrix 
- leads to the following labels 

[(ai,i © a 2 ,i), . . . , (oi 

i.e., the first m bits of each label hold the component-wise 
modulo-2 sum of row and column index. It is easily verified 
that this labeling in fact represents a set-partitioning. 

We will show now that this set-partitioned square M 2 - 
QAM constellation can be transformed into a Gray-labeled 
constellation by a simple linear transform, just like for the 
case of ASK/PSK. 

Since the transform G2 <8> I m is obviously self-inverse, by 
application to the SP-labeled constellation we obtain again 

a = [ a l,lj • • • ) Ol,m, &2,1) • • • ) a 2,m] ■ 

From d55l l. the transform 1 2 ® T m applies a (binary reflected) 
Gray labeling independently to the (now naturally labeled) row 
and column indices which obviously describes a Gray-labeled 
M 2 -QAM constellation. 
In summary, 

G 2 ®T m = (G 2 ® I m ) (la ® T m ) 

transforms an SP-labeled M 2 -QAM constellation into a Gray- 
labeled one where T m denotes the linear transform from (155V 
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